feat: agent-driven policy management MVP by zredlined · Pull Request #1151 · NVIDIA/OpenShell

zredlined · 2026-05-04T18:24:54Z

Summary

MVP of agent-driven policy management — the loop where an in-sandbox agent
hits a policy block, drafts a narrow rule, submits it via policy.local,
and the developer approves out-of-band. Implements the vertical slice
described in RFC 0001 and
tracked under #1062, with the build plan in
architecture/plans/agent-driven-policy-management-v1.md.

The full loop demoed end-to-end with Codex inside an OpenShell sandbox
writes a markdown file to GitHub via a proposal-and-approval round-trip
that takes about two minutes total.

Related Issue

Refs #1062.

Demo output

Real run from a local dev gateway against a scratch GitHub repo. The whole
loop is bash examples/agent-driven-policy-management/demo.sh with no
arguments — defaults resolve from gh and ~/.codex/auth.json.

==> Preflight

  gateway:  connected · 0.0.37-dev.66+g91014bce
  github:   <owner>/openshell-policy-demo @ main (ab75c15)
  providers created (codex, github) — credentials injected as env vars only

==> Run summary

  repo:     <owner>/openshell-policy-demo
  branch:   main
  target:   openshell-policy-advisor-demo/<run-id>.md
  sandbox:  policy-demo-<run-id>

==> Launching sandbox; agent will hit a policy block and draft a proposal

  initial policy:  read-only access to api.github.com (no PUT)
  agent task:      PUT /repos/<owner>/openshell-policy-demo/contents/...
  live log:        /tmp/openshell-policy-demo.XXXXXX/agent.log

==> Waiting for the agent to draft a policy proposal

  Inside the sandbox right now:

    [1] agent: curl -X PUT https://api.github.com/repos/<owner>/.../contents/...
    [2] L7 proxy denies the write and returns a structured 403 the
        agent can parse and act on:
        {
          "error":      "policy_denied",
          "layer":      "l7",
          "method":     "PUT",
          "path":       "/repos/<owner>/.../contents/<run-id>.md",
          "rule_missing": { "type": "rest_allow", "host": "api.github.com", "port": 443, "method": "PUT", ... },
          "next_steps": [
            { "action": "read_skill",      "path": "/etc/openshell/skills/policy_advisor.md" },
            { "action": "submit_proposal", "url":  "http://policy.local/v1/proposals" }
          ]
        }
    [3] agent reads the skill, drafts a narrow addRule for exactly that path
    [4] agent POSTs the proposal to http://policy.local/v1/proposals
    [5] supervisor forwards it to the gateway as a pending draft

  Polling for the pending draft...

  proposal received:
  Binary: /usr/bin/curl
  Rationale: Allow /usr/bin/curl to write one GitHub Contents API file for run <run-id> in <owner>/openshell-policy-demo only.
  Endpoints: api.github.com:443 [L7 rest, allow PUT /repos/<owner>/openshell-policy-demo/contents/openshell-policy-advisor-demo/<run-id>.md]

==> Approving and waiting for the agent to retry

  OK 1 chunk(s) approved, 0 skipped. Policy version: 2
  agent retried after policy hot-reload — write succeeded

==> Verifying GitHub write

  file: openshell-policy-advisor-demo/<run-id>.md
  url:  https://github.com/<owner>/openshell-policy-demo/blob/main/...

==> Policy decision trace (OCSF)

  [t+0]  HTTP:PUT [MED] DENIED  PUT http://api.github.com:443/repos/.../...md [policy:github_api_readonly engine:l7] [reason:L7_REQUEST deny PUT ...]
  [t+26] CONFIG:LOADED [INFO] Policy reloaded successfully [policy_hash:...]
  [t+50] HTTP:PUT [INFO] ALLOWED PUT http://api.github.com:443/repos/.../...md [policy:github_api_readonly engine:l7]

✓ Demo complete.

Two things worth calling out from this output:

The Endpoints: line shows the actual L7 grant — [L7 rest, allow PUT /repos/.../...md]. Codex submits a method/path-scoped REST rule, not a broad L4 allow. Before this PR, openshell rule get rendered only host:port and dropped protocol, access, and the rules array, so a developer at approval time couldn't tell L4 from L7. The CLI rendering fix surfaces what the agent already submits.
The structured 403 body is the contract. It carries layer, method, path, rule_missing, and next_steps — enough for the agent to recover without prompt scaffolding telling it which file to read. Reading the skill is one of the next_steps the response itself names.

Changes

`feat(sandbox): wire policy.local denials to OCSF JSONL log`

GET /v1/denials?last=N on the sandbox-local API now reads the OCSF
JSONL log at /var/log/openshell-ocsf.YYYY-MM-DD.log, filters to network
and L7 denials (action_id=2, class_uid 4001/4002), and returns a
compact summary newest-first. Default limit 10, capped at 100. Runs in
spawn_blocking so file I/O does not stall the policy.local handler.
POST /v1/proposals now uses the typed grpc_client wrapper instead of
raw_client. Wrapper return type extended to the response struct so
accepted/rejected counts surface uniformly.
Dropped the add_rule snake_case alias in proposal JSON; canonical
form is addRule, matching PolicyMergeOperation convention used
elsewhere in the codebase.
skills/policy_advisor.md updated to document the now-real
/v1/denials?last=10 endpoint and use addRule consistently.

`feat(cli): show L7 protocol/method/path in rule get output`

format_endpoint() previously rendered only host:port, dropping
protocol, access, and the L7 rules array. That made
openshell rule get text output unable to distinguish a broad L4 grant
from a method/path-scoped L7 REST rule — exactly the distinction a
developer needs at approval time.

New rendering tags each endpoint with its enforcement layer and surfaces
allow/deny rules:

bare L4:           api.example:443 [L4]
L7 read-only:      api.example:443 [L7 rest, access=read-only]
L7 method/path:    api.example:443 [L7 rest, allow PUT /v1/foo/bar]

Pure display change: no proto, gateway, or behavior changes. Unit test
covers all three cases with synthetic fixtures.

`refactor(examples): rewrite policy demo as Codex-default loop`

Re-shaped examples/agent-driven-policy-management/ as a single, clean
end-to-end demonstration with smart defaults — bash demo.sh works after
gh auth login and codex login, with no .env ceremony or required
arguments. Defaults resolve from gh (owner via gh api user, repo
defaults to openshell-policy-demo, token from gh auth token).

Demo output narrates the loop for a developer reading along: structured
deny body the agent receives, the agent's drafted proposal (now showing
the L7 method/path), the policy hot-reload, and the OCSF trace at the end
filtered to the three story-relevant events.

Moved the deterministic no-LLM regression harness out of examples/ into
e2e/policy-advisor/ — it was a parallel demo, not an example. Same loop
without the LLM, useful for iterating on the proxy and policy.local API.

The README documents the trust model honestly: structured rule is the
contract, agent rationale is a hint, prover validation badge in progress
per RFC 0001 Phase 3.

Testing

cargo test -p openshell-sandbox --lib (650 tests, all pass; 13 in
policy_local::tests)
cargo test -p openshell-cli format_endpoint (renderer unit test
covers L4, L7 read-only, L7 method/path)
cargo clippy -p openshell-sandbox --lib --tests -- -D warnings (clean)
shellcheck clean on demo.sh, sandbox-agent.sh,
e2e/policy-advisor/test.sh, e2e/policy-advisor/sandbox-runner.sh
Manual end-to-end: bash examples/agent-driven-policy-management/demo.sh
against a local gateway with Codex auth and a scratch GitHub repo.
Confirmed Codex submits a method/path-scoped L7 REST rule (visible
after the CLI rendering fix) and that hot-reload + retry works.

Update (review iteration, May 6)

After the initial draft and a quick sync with @johntmyers on the design,
two follow-up commits land on this branch:

docs(policy-advisor): refresh L7 deny note for structured 403 contract
— architecture/policy-advisor.md line 47 was stale (still framed L7
deny as future work under feat: LLM-powered PolicyAdvisor agent harness for intelligent policy recommendations #205). Updated to reflect the structured 403
body that this PR delivers against feat(policy): add agent-readable L7 deny body #1090; LLM enrichment remains the
only piece left for feat: LLM-powered PolicyAdvisor agent harness for intelligent policy recommendations #205.
feat(sandbox): switch /v1/denials to shorthand log pass-through — the
sandbox-local /v1/denials endpoint previously parsed
/var/log/openshell-ocsf.*.log (OCSF JSONL), which is opt-in via
ocsf_json_enabled. By default the endpoint returned an empty list with
a "log not enabled" hint, breaking the inspect_recent_denials step in
the structured 403's next_steps array out of the box.

Two ways to fix this: flip ocsf_json_enabled=true by default, or read
the always-on shorthand log instead. The first opens a defaults /
compliance discussion (every sandbox would persist a daily-rotated
audit trail on upgrade). The second is what the team picked: read the
shorthand log and pass raw lines through to the agent — same content,
no defaults change, easier to extend (adding fields to the shorthand
renderer is a single-file change here, no schema rev).

Verified end-to-end against a fresh sandbox: /v1/denials now returns
an array of raw shorthand strings, newest-first, with log_available: true out of the box.

The principal-engineer review also caught two correctness issues that
are addressed in the same commit:
- UTF-8 boundary: byte-naive &line[..MAX] would panic on
  multi-byte sequences. Now uses an is_char_boundary walk-down via
  truncate_at_char_boundary, with a multi-byte test.
- Query-string leak (consumer-side fix): the FORWARD deny path in
  proxy.rs populates OcsfUrl::new(...) and .message(...) with the
  raw request path including ?query=..., unlike the L7 path which
  uses redacted_target. To keep secrets out of /v1/denials while
  the upstream emit sites are tightened separately,
  redact_query_strings strips ?<query> to ?[redacted] from each
  surfaced line, before truncation so a cut cannot slice mid-secret.
  Hardening the upstream emit sites in proxy.rs so the on-disk
  shorthand log is itself clean is tracked as a follow-up — the
  consumer-side redaction is defense-in-depth.

Net result: the agent loop ergonomics are now live by default with no
sandbox setting changes, and the pre-existing FORWARD-path leak is gated
out of the new agent surface.

Checklist

Follows Conventional Commits (feat(sandbox), feat(cli),
refactor(examples))
Commits signed off — happy to amend with --signoff if maintainers
want; matched the existing branch style for now
Architecture docs current — architecture/policy-advisor.md updated
to reflect the structured 403 deny body delivered against feat(policy): add agent-readable L7 deny body #1090;
RFC 0001 and the build plan describe the broader scope.
Adds neither prescriptive prompt scaffolding nor per-host heuristics.
The renderer surfaces what the agent submitted; if a future agent
defaults to L4 against a known-REST host, that signal belongs in the
gateway-side prover (Phase 3), not in the prompt.

Out of scope (deferred per the build plan)

Multi-sandbox push inbox (TUI polling 2s for now)
Slack / web inbox adapters
Supervisor UDS API (sandbox-local HTTP at policy.local is the agent
surface this PR ships)
In-process prover optimization
Org ceiling and trusted external auto-apply
Per-binary auto-approve patterns

Side note

While testing this branch I noticed examples/multi-agent-notepad/demo.sh
regressed against main after #952 / #1028 changed --upload <dir>:<dir>
semantics. Filed as #1147 with a five-line suggested fix. Not in scope here.

copy-pr-bot · 2026-05-04T18:24:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Update architecture/policy-advisor.md to reflect that L7 per-request denials now ship as a structured 403 body (layer/method/path/host/port/ binary/rule_missing/next_steps), not as untyped tracing output. Cite RFC 0001 / #1151 for the contract; LLM-powered enrichment remains future work under #205.

zredlined · 2026-05-07T15:26:17Z

Rebased onto origin/main (now fe41d679) to resolve the conflict. Force-pushed; new head is 84f4e29f.

Three real conflicts during the replay:

Cargo.lock (lockfile-refresh commit, third in the series) — took main's lockfile and verified cargo check -p openshell-sandbox --lib still resolves cleanly against this branch's Cargo.toml.
crates/openshell-cli/src/run.rs (the L7-rule-rendering commit feat(cli): show L7 protocol/method/path in rule get output) — pure import-list collision in the test module's use super::{...}. Kept main's three new symbols (dockerfile_sources_supported_for_gateway, gateway_env_override_warning, resolve_from) and this branch's new format_endpoint. Verified all four exist as actual symbols in the file at this commit.
Semantic mismatch with feat(policy): add GraphQL L7 inspection #1083 (GraphQL L7 inspection) — main added new fields to L7Allow, L7DenyRule, and NetworkEndpoint, plus a sixth DenyResponseContext parameter to RestProvider::deny_with_redacted_target. Git didn't flag these as text conflicts; they showed up as compile errors after the rebase. Folded into one cleanup commit on top:
- 84f4e29f chore(sandbox): align proto inits with main's L7 GraphQL additions
- crates/openshell-sandbox/src/l7/relay.rs — two deny_with_redacted_target call sites (REST and GraphQL deny paths) now pass DenyResponseContext { host, port, binary } matching the pattern at line 581.
- crates/openshell-sandbox/src/policy_local.rs — L7Allow, L7DenyRule, and NetworkEndpoint initializers populate the new GraphQL/path-scoping fields with empty defaults. Agent-authored proposals from policy.local target REST/SQL/L4 today; GraphQL operation matching is set on the gateway side or via direct YAML, so empty defaults are correct here.

Verified locally:

cargo test -p openshell-sandbox --lib → 650 pass, 2 ignored
cargo clippy -p openshell-sandbox --lib --tests -- -D warnings → clean
Re-ran the e2e flow against a fresh sandbox: /v1/denials still returns raw shorthand strings newest-first; structured 403 deny body unchanged; proposal submit + approve + retry cycle still works end-to-end.

The two recent commits on top (676934e4 docs/architecture refresh and c8e5c6a9 shorthand pass-through with redaction + UTF-8 safety) are unchanged in content; just rebased SHAs.

zredlined · 2026-05-08T03:04:52Z

Rebased again — main moved another 20 commits forward and PR #1184 (architecture-doc reset) deleted architecture/policy-advisor.md, the file my prior docs(policy-advisor) commit edited. New head is f48f4c87.

Changes during this rebase:

Dropped 676934e4 docs(policy-advisor): refresh L7 deny note for structured 403 contract — target file is gone from main. Same context already lives in RFC 0001 in this branch; no content lost.
Resolved 1 conflict in crates/openshell-cli/src/run.rs — kept main's refactored CLI test imports while preserving the branch's format_endpoint L7 rendering helper and unit coverage.
Verified: cargo test -p openshell-sandbox --lib (595 pass, 1 ignored), cargo clippy -p openshell-sandbox --lib --tests -- -D warnings (clean).

Ready for review.

github-actions · 2026-05-08T18:06:55Z

Label test:e2e applied, but pull-request/1151 is at {"messa while the PR head is 63b5288. A maintainer needs to comment /ok to test 63b528804a392edd0ddf2ab748b22fc8ec32baf1 to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

johntmyers · 2026-05-08T18:08:56Z

/ok to test 63b5288

Three changes addressing review feedback before merging the agent-driven policy management MVP: - Distinguish "OCSF JSONL enabled, no denials" from "OCSF JSONL disabled, nothing to read." The endpoint now returns a `log_available` flag and an explanatory `note` when the log file is missing, so the in-sandbox agent can give the developer an accurate hint instead of a misleading empty list. - Stop echoing the OCSF `message` field in the per-denial summary. The proxy's denial messages can include the request path with query string (e.g., `?access_token=...`); the structured `host`/`port`/`method`/ `path`/`binary` fields carry everything the agent needs to draft a proposal, and `path` is sourced from `http_request.url.path` which already excludes the query string. - Cap `read_request_body` at a 15s timeout. Bounds slowloris-style stalls from a misbehaving in-sandbox process. The proxy listener only accepts loopback connections so practical impact is small, but this is cheap defense-in-depth. New tests cover the missing-log signal and the message-redaction guarantee.

…_DIR Two small hardening passes on the policy management demo: - `fail()` now pipes the agent log tail through a redactor that masks the GitHub token and Codex credential triple before printing. Codex itself is well-behaved about not echoing the token, but a misbehaving tool call could leak it; this is a final safety net before the log hits the developer's terminal (and any clipboard or chat history that follows). - `validate_env` now regex-checks DEMO_FILE_DIR with the same allow-list the other path-shaped variables use. The value is interpolated through sed with `|` as the delimiter when rendering the agent task; rejecting unsupported characters keeps the templating predictable and stops a user-supplied value from breaking out into a shell context.

Addresses review feedback that the deny body's `next_steps` array and the route table could drift apart. The route paths and skill location now live as `pub const`s in `policy_local.rs` and feed both: - the dispatcher in `route_request` that matches against them - a new `agent_next_steps()` helper that builds the JSON the L7 deny body embeds `l7/rest.rs::deny_response_body` calls `policy_local::agent_next_steps()` instead of inlining the array, so adding or renaming a route is a one-line change in `policy_local.rs` and the agent contract follows automatically.

Previously /v1/denials parsed `/var/log/openshell-ocsf.*.log` (OCSF JSONL) and returned structured per-event objects. JSONL is opt-in via `ocsf_json_enabled`, so the endpoint returned an empty list with a "log not enabled" hint by default — agents had to navigate a setup step before the inspect-recent-denials guidance was useful. Switch to reading the shorthand log at `/var/log/openshell.*.log`, which is always-on and the same human-readable format `openshell logs` displays. The endpoint now returns raw shorthand lines (newest first) — the agent reads them directly, no field parsing. Tradeoffs: - Removes the JSONL-on-by-default debate: shorthand is already on, no defaults change. - Updating shorthand is a single-file change in this repo; no schema rev needed when we want to add fields. Implementation: - `read_recent_denial_lines` walks shorthand log files newest-first, filters lines with ` OCSF ` AND ` DENIED ` (the OCSF action label, uppercase, space-bounded). - `collect_shorthand_log_files` matches `openshell.<date>.log`; the trailing dot in `SHORTHAND_LOG_PREFIX = "openshell."` excludes `openshell-ocsf.<date>.log` so JSONL-on doesn't bleed into responses. - 4096-byte cap per surfaced line as defense against pathological inputs. - Skill doc updated to reflect that `/v1/denials` returns raw shorthand lines, not structured fields. Defense-in-depth on query-string secrets: - `redact_query_strings` strips `?<query>` to `?[redacted]` from each surfaced line. The L7 relay path emits OCSF events using `redacted_target` (secret-placeholder redaction), but the FORWARD deny path in `proxy.rs` populates `OcsfUrl::new("http", host, path, port)` and `.message(...)` with the raw request path — query string included. Stripping queries at the consumer guards `/v1/denials` regardless of whether the upstream emit sites are tightened. The on-disk log is not rewritten by this change; that is a separate hardening task tracked for the FORWARD path emit sites in proxy.rs. - `truncate_at_char_boundary` is UTF-8 safe; redaction runs before truncation so a cut cannot slice mid-secret. Tests: - `recent_denials_returns_newest_first_from_shorthand_lines` covers the happy path with mixed allowed/denied/non-OCSF lines. - `recent_denials_skips_jsonl_log_files` confirms JSONL files don't surface even if present. - `recent_denials_truncates_pathological_lines` covers the cap. - `is_ocsf_denial_line_filters_correctly` covers the line-level filter. - `redact_query_strings_removes_query_from_url_token` and `redact_query_strings_removes_query_in_reason_tag` cover the redaction in both URL token and `[reason:...]` contexts. - `truncate_at_char_boundary_does_not_panic_on_multibyte` covers the UTF-8 safety.

Post-rebase fixups after #1083 (GraphQL L7 inspection) landed on main and introduced new fields on the proto types this branch constructs: - `crates/openshell-sandbox/src/l7/relay.rs`: two `deny_with_redacted_target` call sites (REST and GraphQL relay deny paths) now pass the `DenyResponseContext` argument that `rest::send_deny_response` expects. Both sites pass `host`, `port`, and `binary` from the existing `L7EvalContext`, matching the pattern used at the primary deny site. - `crates/openshell-sandbox/src/policy_local.rs`: `L7Allow`, `L7DenyRule`, and `NetworkEndpoint` proto initializers now populate the new GraphQL and path-scoping fields with empty defaults. Agent-authored proposals via `policy.local` target REST/SQL/L4 today; GraphQL operation matching is set on the gateway side or via direct YAML, so empty defaults are correct here. No behavior change. `cargo test -p openshell-sandbox --lib` (650 tests) and `cargo clippy -p openshell-sandbox --lib --tests -- -D warnings` clean.

The agent-driven policy proposal surface delivered by this PR (skill install, `policy.local` API, `next_steps` array on L7 deny bodies) is now opt-in via the new `agent_policy_proposals_enabled` setting. Default false. Same shape as `providers_v2_enabled`: registered in `openshell-core::settings`, sandbox-level, hot-toggleable via the existing settings poll loop. Why: the surface is a novel agent-controlled mutation point in every sandbox. The per-proposal developer approval gate is a correctness control, but it doesn't address "should this sandbox have an agent-authoring API at all" — compliance teams may want that question closed. The flag is the second gate. Implementation: - New registry entry + `AGENT_POLICY_PROPOSALS_ENABLED_KEY` constant in `openshell-core::settings`. - `lib.rs`: process-wide `OnceLock<Arc<AtomicBool>>` mirroring the `OCSF_CTX` pattern. `agent_proposals_enabled()` is the single read point. - Initial settings fetch added to `run_sandbox` so skill install honors the flag at startup (not just on the poll loop's first tick). - Skill install in `run_sandbox` is gated on the flag. - `policy_local::route_request` returns `404 feature_disabled` for all routes when the flag is off — including the otherwise-public `current_policy` and `denials` routes. When the surface is off it's off entirely. - `policy_local::agent_next_steps` returns an empty array when the flag is off so deny bodies don't advertise routes that 404. - Poll loop updates the atomic on each tick, lazily installs the skill on a false→true transition (no claw-back on true→false; stale skill on disk is harmless because route + next_steps gate on the live atom). Tests: - Shared `test_helpers::ProposalsFlagGuard` mutex+atomic guard for the process-wide flag, used across `policy_local::tests` and `l7::rest::tests`. - New: `agent_next_steps_returns_empty_when_flag_off`, `agent_next_steps_returns_full_array_when_flag_on`, `route_request_returns_feature_disabled_when_flag_off`. - Updated existing tests that exercise the deny body or the route dispatcher to set the flag on first. - Full sandbox lib test suite: 653 pass, clippy clean. Demo and e2e: - `examples/agent-driven-policy-management/demo.sh` and `e2e/policy-advisor/test.sh` now snapshot the prior global value of the setting, set it to true before sandbox creation (so the supervisor's initial poll picks it up), and restore on exit (delete if previously unset, otherwise write the prior value back). Docs: - RFC 0001 MVP-implementation note documents the flag, default, and intended soft-launch posture.

…ctive id The smoke harness for the agent feedback loop caught a real bug in the gateway: SubmitPolicyAnalysis's response carried a chunk_id that was never persisted whenever the SQL ON CONFLICT path fired. Two failure modes, both load-bearing: - Agent-authored proposals targeting the same host/port/binary (e.g. the redraft-after-rejection loop) silently folded into one row and any RejectDraftChunk by the new chunk_id failed with "chunk not found." Latent since #1151, surfaced by #1094 returning chunk_ids. - Mechanistic mode had the same class of bug — the dedup fold-in is the intended behavior there, but the response still advertised the newly-generated UUID instead of the existing row's id. Less visible because no current caller reads mechanistic chunk_ids back, but the proto contract was violated either way. Fix in three parts: - put_draft_chunk now takes Option<&str> dedup_key explicitly and returns the effective row id (via RETURNING). None binds NULL to the dedup_key column, which bypasses the partial-index ON CONFLICT path entirely. Caller-decides semantics replace store-side magic. - handle_submit_policy_analysis picks dedup_key per chunk using an allowlist (only "mechanistic" dedups) and pushes the returned effective_id to accepted_chunk_ids. New modes default to no-dedup so a misconfigured caller cannot silently lose proposals. - The two-copy draft_chunk_dedup_key helper consolidated to one observation_dedup_key in policy_store.rs with a doc comment. Tests: - agent_authored_submits_for_same_endpoint_do_not_dedup pins the redraft-loop contract: two intentional submissions with the same host/port/binary get distinct chunk_ids, both findable via GetDraftPolicy, both rejectable by id. - mechanistic_submits_for_same_endpoint_dedup_into_one_chunk locks in the observation-mode dedup AND asserts both submits return the same effective_id — would have caught the deeper bug. Proto: SubmitPolicyAnalysisRequest.analysis_mode doc updated to describe the actual semantics (mechanistic dedups, agent_authored and unknown modes do not). Signed-off-by: Alexander Watson <zredlined@gmail.com>

The narrated demo (examples/agent-driven-policy-management) has been the public face of this feature since #1151. Its agent prompt told Codex to retry the original PUT every few seconds for up to 120 seconds — a polling workaround for the missing /wait endpoint that this branch shipped. Update the demo to exercise /wait so the canonical reading of the feature reflects the actual UX win. - agent-task.md: step 4 is now "call /wait, branch on status" with the three outcomes spelled out (approved → retry once; rejected → read rejection_reason and revise or stop; pending+timed_out → re-issue /wait once, do NOT busy-loop or shorten the timeout). Also makes explicit that the demo submits one rule per proposal so accepted_chunk_ids[0] is the safe single id to wait on. - demo.sh: header docstring rewritten as a six-step loop that mirrors the README. narrate_sandbox_workflow drops its parallel numbering and uses bullets (the runtime narration is the agent's sub-actions, not a separate decomposition of the loop). Approve step header and success message now reference /wait waking the agent, not "policy hot-reload retry." - README.md: top-of-file flow expanded from 5 to 6 steps to include the /wait call and chunk_id capture; "Going further" section now describes both regression scripts and the boundary between them (real-GitHub retry vs. synthetic /wait wire test). Slow-path qualifier corrected from "image pull on first run" to "sandbox cold-start (SSH bring-up plus Codex install)". - wait-smoke.sh header rewritten to make it unambiguous this is a regression, NOT a tutorial, with explicit prereq commands instead of prereq descriptions, and a pointer at demo.sh for the narrated story. No code paths change; this is the readability pass. Signed-off-by: Alexander Watson <zredlined@gmail.com>

…ctive id The smoke harness for the agent feedback loop caught a real bug in the gateway: SubmitPolicyAnalysis's response carried a chunk_id that was never persisted whenever the SQL ON CONFLICT path fired. Two failure modes, both load-bearing: - Agent-authored proposals targeting the same host/port/binary (e.g. the redraft-after-rejection loop) silently folded into one row and any RejectDraftChunk by the new chunk_id failed with "chunk not found." Latent since #1151, surfaced by #1094 returning chunk_ids. - Mechanistic mode had the same class of bug — the dedup fold-in is the intended behavior there, but the response still advertised the newly-generated UUID instead of the existing row's id. Less visible because no current caller reads mechanistic chunk_ids back, but the proto contract was violated either way. Fix in three parts: - put_draft_chunk now takes Option<&str> dedup_key explicitly and returns the effective row id (via RETURNING). None binds NULL to the dedup_key column, which bypasses the partial-index ON CONFLICT path entirely. Caller-decides semantics replace store-side magic. - handle_submit_policy_analysis picks dedup_key per chunk using an allowlist (only "mechanistic" dedups) and pushes the returned effective_id to accepted_chunk_ids. New modes default to no-dedup so a misconfigured caller cannot silently lose proposals. - The two-copy draft_chunk_dedup_key helper consolidated to one observation_dedup_key in policy_store.rs with a doc comment. Tests: - agent_authored_submits_for_same_endpoint_do_not_dedup pins the redraft-loop contract: two intentional submissions with the same host/port/binary get distinct chunk_ids, both findable via GetDraftPolicy, both rejectable by id. - mechanistic_submits_for_same_endpoint_dedup_into_one_chunk locks in the observation-mode dedup AND asserts both submits return the same effective_id — would have caught the deeper bug. Proto: SubmitPolicyAnalysisRequest.analysis_mode doc updated to describe the actual semantics (mechanistic dedups, agent_authored and unknown modes do not). Signed-off-by: Alexander Watson <zredlined@gmail.com>

The narrated demo (examples/agent-driven-policy-management) has been the public face of this feature since #1151. Its agent prompt told Codex to retry the original PUT every few seconds for up to 120 seconds — a polling workaround for the missing /wait endpoint that this branch shipped. Update the demo to exercise /wait so the canonical reading of the feature reflects the actual UX win. - agent-task.md: step 4 is now "call /wait, branch on status" with the three outcomes spelled out (approved → retry once; rejected → read rejection_reason and revise or stop; pending+timed_out → re-issue /wait once, do NOT busy-loop or shorten the timeout). Also makes explicit that the demo submits one rule per proposal so accepted_chunk_ids[0] is the safe single id to wait on. - demo.sh: header docstring rewritten as a six-step loop that mirrors the README. narrate_sandbox_workflow drops its parallel numbering and uses bullets (the runtime narration is the agent's sub-actions, not a separate decomposition of the loop). Approve step header and success message now reference /wait waking the agent, not "policy hot-reload retry." - README.md: top-of-file flow expanded from 5 to 6 steps to include the /wait call and chunk_id capture; "Going further" section now describes both regression scripts and the boundary between them (real-GitHub retry vs. synthetic /wait wire test). Slow-path qualifier corrected from "image pull on first run" to "sandbox cold-start (SSH bring-up plus Codex install)". - wait-smoke.sh header rewritten to make it unambiguous this is a regression, NOT a tutorial, with explicit prereq commands instead of prereq descriptions, and a pointer at demo.sh for the narrated story. No code paths change; this is the readability pass. Signed-off-by: Alexander Watson <zredlined@gmail.com>

…ctive id The smoke harness for the agent feedback loop caught a real bug in the gateway: SubmitPolicyAnalysis's response carried a chunk_id that was never persisted whenever the SQL ON CONFLICT path fired. Two failure modes, both load-bearing: - Agent-authored proposals targeting the same host/port/binary (e.g. the redraft-after-rejection loop) silently folded into one row and any RejectDraftChunk by the new chunk_id failed with "chunk not found." Latent since #1151, surfaced by #1094 returning chunk_ids. - Mechanistic mode had the same class of bug — the dedup fold-in is the intended behavior there, but the response still advertised the newly-generated UUID instead of the existing row's id. Less visible because no current caller reads mechanistic chunk_ids back, but the proto contract was violated either way. Fix in three parts: - put_draft_chunk now takes Option<&str> dedup_key explicitly and returns the effective row id (via RETURNING). None binds NULL to the dedup_key column, which bypasses the partial-index ON CONFLICT path entirely. Caller-decides semantics replace store-side magic. - handle_submit_policy_analysis picks dedup_key per chunk using an allowlist (only "mechanistic" dedups) and pushes the returned effective_id to accepted_chunk_ids. New modes default to no-dedup so a misconfigured caller cannot silently lose proposals. - The two-copy draft_chunk_dedup_key helper consolidated to one observation_dedup_key in policy_store.rs with a doc comment. Tests: - agent_authored_submits_for_same_endpoint_do_not_dedup pins the redraft-loop contract: two intentional submissions with the same host/port/binary get distinct chunk_ids, both findable via GetDraftPolicy, both rejectable by id. - mechanistic_submits_for_same_endpoint_dedup_into_one_chunk locks in the observation-mode dedup AND asserts both submits return the same effective_id — would have caught the deeper bug. Proto: SubmitPolicyAnalysisRequest.analysis_mode doc updated to describe the actual semantics (mechanistic dedups, agent_authored and unknown modes do not). Signed-off-by: Alexander Watson <zredlined@gmail.com>

The narrated demo (examples/agent-driven-policy-management) has been the public face of this feature since #1151. Its agent prompt told Codex to retry the original PUT every few seconds for up to 120 seconds — a polling workaround for the missing /wait endpoint that this branch shipped. Update the demo to exercise /wait so the canonical reading of the feature reflects the actual UX win. - agent-task.md: step 4 is now "call /wait, branch on status" with the three outcomes spelled out (approved → retry once; rejected → read rejection_reason and revise or stop; pending+timed_out → re-issue /wait once, do NOT busy-loop or shorten the timeout). Also makes explicit that the demo submits one rule per proposal so accepted_chunk_ids[0] is the safe single id to wait on. - demo.sh: header docstring rewritten as a six-step loop that mirrors the README. narrate_sandbox_workflow drops its parallel numbering and uses bullets (the runtime narration is the agent's sub-actions, not a separate decomposition of the loop). Approve step header and success message now reference /wait waking the agent, not "policy hot-reload retry." - README.md: top-of-file flow expanded from 5 to 6 steps to include the /wait call and chunk_id capture; "Going further" section now describes both regression scripts and the boundary between them (real-GitHub retry vs. synthetic /wait wire test). Slow-path qualifier corrected from "image pull on first run" to "sandbox cold-start (SSH bring-up plus Codex install)". - wait-smoke.sh header rewritten to make it unambiguous this is a regression, NOT a tutorial, with explicit prereq commands instead of prereq descriptions, and a pointer at demo.sh for the narrated story. No code paths change; this is the readability pass. Signed-off-by: Alexander Watson <zredlined@gmail.com>

* feat(policy): plumb chunk_ids and rejection_reason through proposal pipeline Prereq plumbing for the agent revise-and-resubmit loop. Two narrow additive proto changes unblock the upcoming /wait endpoint (#1092), prover validation badge (#1097), and reject --guidance surfaces (#1098). - SubmitPolicyAnalysisResponse: add accepted_chunk_ids so the in-sandbox agent gets handles to watch its proposals. Surfaced through the typed grpc_client wrapper and policy.local's POST /v1/proposals 202 body. Closes #1094. - PolicyChunk + StoredDraftChunk + DraftChunkPayload: add validation_result (gateway prover verdict, populated by #1097) and rejection_reason (operator free-form text). Both plain strings; no enums, no parsing on the read path. Closes #1096. - RejectDraftChunk now persists the existing reason field into the chunk's rejection_reason so it round-trips back to the agent via GetDraftPolicy. UndoDraftChunk clears it on the way back to pending so consumers cannot read a stale guidance string from a prior reject -> re-approve -> undo cycle. Whole surface stays gated behind agent_policy_proposals_enabled. Two focused tests cover the round-trip and the undo-clears guarantee. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(sandbox): add /v1/proposals/{id} and /wait long-poll to policy.local The agent feedback channel back from policy.local. Two new routes let the in-sandbox agent learn its proposal's outcome on a single blocking HTTP call — zero LLM tokens during the wait. - GET /v1/proposals/{chunk_id} returns the chunk's current state in one gateway call. - GET /v1/proposals/{chunk_id}/wait?timeout=<s> blocks until the chunk transitions out of pending. Default 60s, clamped [1, 300]. Agent re-issues on timeout to extend. Response carries the chunk's status plus the two feedback fields shipped in the prereq commit: rejection_reason (free-form reviewer text) and validation_result (gateway prover verdict, empty until #1097). On timeout: same shape with timed_out: true so the agent can disambiguate without parsing. Wait handler short-polls GetDraftPolicy every 1s inside the request with a tokio::time::Instant deadline. One gateway connection is opened per request and reused across all polls, so a 60s wait does one TLS handshake instead of sixty. A future commit can swap the loop body for a tokio::sync::broadcast driven by a watcher task — the agent-visible contract (URL, query, response shape) is independent of the polling implementation. All routes stay behind agent_policy_proposals_enabled. Closes #1092. Signed-off-by: Alexander Watson <zredlined@gmail.com> * docs(sandbox): teach policy_advisor skill the wait + redraft loop The agent-facing instructions for the feedback loop. The endpoints exist; this is the doc that makes them usable. policy_advisor.md gains: - API entries for GET /v1/proposals/{chunk_id} and /wait?timeout=<s>, including the field semantics (status, rejection_reason, validation_result, timed_out). - A note on the submit response's accepted_chunk_ids / rejection_reasons split so the agent handles partial acceptance. - Step 6 saves the chunk_ids and addresses any submit-time rejections before waiting. - Step 7 walks the four wait outcomes: approved (retry, with the honest "may still fail" caveat), rejected (read rejection_reason AND validation_result; address whichever has content), still-pending with timed_out (re-call), non-2xx (surface, do not retry). skills.rs gains two assertions on the skill content so a future edit cannot drop the wait endpoint or the rejection_reason directive silently. Closes #1095. Signed-off-by: Alexander Watson <zredlined@gmail.com> * test(policy-advisor): add end-to-end smoke for the agent feedback loop A focused smoke that exercises the new policy.local /wait endpoint on a live gateway + sandbox, separate from the existing no-LLM regression harness (which still drives the OLD retry-with-bash-loop recovery pattern). Two flows: - Flow A — approve-and-retry: agent submits, /wait blocks, host runs `openshell rule approve`, /wait returns status=approved. Confirms the happy path round-trip latency. - Flow B — reject-with-guidance: agent submits, /wait blocks, host runs `openshell rule reject --reason "..."`, /wait returns status=rejected with the exact reviewer text in rejection_reason. Confirms the free-form guidance contract round-trips through the agent feedback channel. No GitHub credentials needed — proposals are synthetic and never trigger outbound traffic. Both flows expect agent_policy_proposals_enabled=true and a running gateway. Adds three cases to sandbox-runner.sh: submit-test-proposal (no GH deps), proposal-status, proposal-wait. The existing put-file and submit-proposal cases used by test.sh are untouched. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy-advisor): surface real CLI errors from wait-smoke preflight The preflight piped openshell's stderr to /dev/null and relied on jq to default the missing setting key to "<unset>", but under `set -euo pipefail` a non-zero exit from openshell makes the whole pipeline fail and the command substitution exits the script silently before the intended fail() message can print. Capture stderr explicitly, check the CLI exit code, and surface the real error plus the expected fix (port-forward + gateway add + select) when the CLI cannot reach the gateway. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy-advisor): pass --json to settings get in wait-smoke preflight `openshell settings get --global` defaults to a human-readable table; jq cannot parse it and the preflight died with a numeric-literal error. Pass --json so jq gets actual JSON. Also touched up the suggested recovery commands in the preflight error to match the real CLI shape (`gateway add <endpoint> --name <name>` and the env-var override warning). Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy): dedup draft chunks only in mechanistic mode; return effective id The smoke harness for the agent feedback loop caught a real bug in the gateway: SubmitPolicyAnalysis's response carried a chunk_id that was never persisted whenever the SQL ON CONFLICT path fired. Two failure modes, both load-bearing: - Agent-authored proposals targeting the same host/port/binary (e.g. the redraft-after-rejection loop) silently folded into one row and any RejectDraftChunk by the new chunk_id failed with "chunk not found." Latent since #1151, surfaced by #1094 returning chunk_ids. - Mechanistic mode had the same class of bug — the dedup fold-in is the intended behavior there, but the response still advertised the newly-generated UUID instead of the existing row's id. Less visible because no current caller reads mechanistic chunk_ids back, but the proto contract was violated either way. Fix in three parts: - put_draft_chunk now takes Option<&str> dedup_key explicitly and returns the effective row id (via RETURNING). None binds NULL to the dedup_key column, which bypasses the partial-index ON CONFLICT path entirely. Caller-decides semantics replace store-side magic. - handle_submit_policy_analysis picks dedup_key per chunk using an allowlist (only "mechanistic" dedups) and pushes the returned effective_id to accepted_chunk_ids. New modes default to no-dedup so a misconfigured caller cannot silently lose proposals. - The two-copy draft_chunk_dedup_key helper consolidated to one observation_dedup_key in policy_store.rs with a doc comment. Tests: - agent_authored_submits_for_same_endpoint_do_not_dedup pins the redraft-loop contract: two intentional submissions with the same host/port/binary get distinct chunk_ids, both findable via GetDraftPolicy, both rejectable by id. - mechanistic_submits_for_same_endpoint_dedup_into_one_chunk locks in the observation-mode dedup AND asserts both submits return the same effective_id — would have caught the deeper bug. Proto: SubmitPolicyAnalysisRequest.analysis_mode doc updated to describe the actual semantics (mechanistic dedups, agent_authored and unknown modes do not). Signed-off-by: Alexander Watson <zredlined@gmail.com> * docs(examples): retarget policy-management demo at the /wait endpoint The narrated demo (examples/agent-driven-policy-management) has been the public face of this feature since #1151. Its agent prompt told Codex to retry the original PUT every few seconds for up to 120 seconds — a polling workaround for the missing /wait endpoint that this branch shipped. Update the demo to exercise /wait so the canonical reading of the feature reflects the actual UX win. - agent-task.md: step 4 is now "call /wait, branch on status" with the three outcomes spelled out (approved → retry once; rejected → read rejection_reason and revise or stop; pending+timed_out → re-issue /wait once, do NOT busy-loop or shorten the timeout). Also makes explicit that the demo submits one rule per proposal so accepted_chunk_ids[0] is the safe single id to wait on. - demo.sh: header docstring rewritten as a six-step loop that mirrors the README. narrate_sandbox_workflow drops its parallel numbering and uses bullets (the runtime narration is the agent's sub-actions, not a separate decomposition of the loop). Approve step header and success message now reference /wait waking the agent, not "policy hot-reload retry." - README.md: top-of-file flow expanded from 5 to 6 steps to include the /wait call and chunk_id capture; "Going further" section now describes both regression scripts and the boundary between them (real-GitHub retry vs. synthetic /wait wire test). Slow-path qualifier corrected from "image pull on first run" to "sandbox cold-start (SSH bring-up plus Codex install)". - wait-smoke.sh header rewritten to make it unambiguous this is a regression, NOT a tutorial, with explicit prereq commands instead of prereq descriptions, and a pointer at demo.sh for the narrated story. No code paths change; this is the readability pass. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(examples): pass --yes on demo.sh's global setting writes Global setting updates require explicit confirmation in non-interactive mode; demo.sh's enable_agent_proposals and the cleanup restore path were missing --yes and hard-failed the preflight. Pre-existing issue that surfaced now that more of the demo runs through this path. No other behavior change. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(sandbox): emit OCSF audit events for policy proposal lifecycle The demo's policy decision trace previously showed only the proxy enforcement story (HTTP:PUT DENIED, CONFIG:LOADED, HTTP:PUT ALLOWED). It was silent about who proposed what or who decided what — the audit-trail receipts for the agent feedback loop were missing. policy.local now emits sandbox-side OCSF events at the observation moments, into the same stream as the existing CONFIG:LOADED: - CONFIG:PROPOSED on submit_proposal acceptance. Per accepted chunk: the message names the chunk_id, target endpoint, L7 method/path, and binary so the trace correlates against the inbox card via chunk_id. - CONFIG:APPROVED on /wait observation of approved status. - CONFIG:REJECTED on /wait observation of rejected status. Carries the reviewer's free-form rejection_reason in the message AND as an unmapped field, both sanitized (control chars stripped, capped at 200 chars with an ellipsis marker). The agent still reads the raw text via GET /v1/proposals/{id}; sanitization is audit-side only, per AGENTS.md's no-secrets-in-OCSF rule. The submit path defends the audit_summaries / accepted_chunk_ids index pairing against a future gateway change that compresses past rejected chunks (the proto doesn't promise 1:1 ordering with the request). Today client-side validation makes the lengths always match; if they don't, the pairing falls back to a generic per-id event rather than mis-attribute. The wait handler's emit site fires once per terminal-status observation. Multiple concurrent waiters on the same chunk would each emit one event; acceptable for single-waiter-per-chunk demos and the right place to dedup is the SIEM. demo.sh's trace filter now surfaces the four CONFIG: events alongside HTTP:PUT, so the trace at the end of every run tells the full story from deny to allow via propose -> approve. wait-smoke.sh's prereq notes recommend redirecting kubectl port-forward output so its "Handling connection for 8090" lines don't bleed into demo narration. Three new unit tests on the sandbox-side helpers — summary builder happy path, fallback, and the rejection_reason sanitizer. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(policy): /wait awaits local policy reload; demo auto-approves redrafts Three things in one commit, all surfaced by running the demo end-to-end against a real gateway and finding the agent had to draft a broader second proposal. 1. /wait race fix. Previously /wait returned `approved` the moment it observed the gateway's chunk status flip, but the local supervisor reloads policy on its own poll cycle (~10s in practice). The agent's retry would race the reload and hit the still-old policy, getting denied. Codex then drafted a broader rule and re-submitted — sound agent behavior, but not what /wait should provoke. Now /wait captures the local policy version at start, and after observed-approved waits for the supervisor to load a strictly-newer version before returning. Bounded by the caller's deadline; best-effort return if the deadline elapses without the version bumping. Two new unit tests pin the happy path and the deadline-clamped fallback. 2. demo.sh auto-approve loop. Replaces approve_when_pending + wait_for_agent with one approve_pending_until_agent_exits function that keeps watching for pending chunks and approving them until the agent process exits (or the configured timeout). Defense in depth against future redraft scenarios for any reason; today (post-fix #1) the agent should only submit one proposal per task, but we don't want to hang silently if it does submit more. 3. UX. Step headers now carry "[t+1.2s]" relative timestamps so reading the run output makes latency visible (the demo's whole point is the wait is cheap — surface that). A spin_wait helper renders an ASCII spinner during the watch loop so the demo never looks frozen on a TTY. Falls back to plain sleep on non-TTY contexts. Closes the race condition diagnosed from the trace timing where the gateway approved at t+0, sandbox observed at t+0.3s, but the supervisor didn't load v2 until t+9.4s — well after the agent had already retried and been denied. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(sandbox): /wait detects policy reload by content, not the schema version The previous attempt at the /wait-after-approve race fix compared `SandboxPolicy.version` between /wait start and the policy reload — but that field is the *schema* version (constant 1), not a revision counter. Every comparison was `current(1) > baseline(1) == false`, so the wait blocked until the agent's 300s timeout regardless of whether the supervisor had actually reloaded. The demo SSH connection then timed out around the 240s mark. Diagnosed from a live run's OCSF trace: supervisor pulled v2 at +8.5s after approval (CONFIG:LOADED), but the sandbox-side CONFIG:APPROVED that my /wait emits didn't fire until +304s — exactly at the 300s deadline. Fix: compare the whole policy via prost's derived PartialEq. Any field change (network_policies map being the only one that actually mutates today) flips equality. A clone-per-200ms-tick on a few-KB proto is cheap inside the bounded wait window. Tests rewritten to match the new contract: the supervisor-reload fixture now keeps `version: 1` constant and changes `network_policies` contents, mirroring the exact failure mode from the live run. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(examples): redact tokens with python literal-string replace, not sed The sed-based redact_log in demo.sh broke when one of the auth tokens contained a character that conflicted with sed's pattern parser ("unterminated substitute pattern" on the Codex JWT). The whole log tail then blanks on failure, hiding the very failure context we're trying to surface. Switch to a python subprocess that takes the tokens via argv and does literal str.replace. No regex, no delimiter games, no truncation. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(sandbox): scope /wait reload check to the approved rule Reviewer (John Myers) flagged two failure modes in the prior whole-policy fingerprint approach used by policy.local /wait: - False sleep: when the supervisor reloads between two /wait calls (the skill tells the agent to re-issue on timed_out), the new call snapshots the already-updated policy as baseline and burns the full timeout waiting for a change that never comes. - False wakeup: any unrelated reload (other agent's approval, settings change) flips the diff, but the chunk's actual rule may not be loaded yet — the agent retries and hits policy_denied for no real signal. Replace the diff with rule-coverage. New public helper openshell_policy::policy_covers_rule reuses endpoints_overlap (so it matches add_rule's merge semantics, including the fold-into-existing-key case) plus an L7 allow check on method/path (so an existing endpoint that doesn't yet contain the proposed method doesn't signal coverage). Add policy_reloaded: true|false to the /wait response on approve, with a 500ms floor on the reload-wait phase so approvals arriving near the deadline still get a fair shot at reloaded=true. Update the policy_advisor skill to branch on it: reloaded=true → retry; reloaded=false → re-issue /wait once with timeout=30, then surface to user. Don't loop tightly. Tests: - 9 new unit tests in openshell-policy pinning coverage semantics (L4-only, L7 method gap, fold-into-existing-key, empty binaries). - 4 new tokio tests in policy_local mirroring John's exact scenarios. - wait-smoke.sh asserts policy_reloaded=true on Flow A. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(server): make GetDraftPolicy dual-auth so /wait works under OIDC policy.local calls GetDraftPolicy from inside the sandbox supervisor via the sandbox gRPC client, which authenticates with the shared x-sandbox-secret. GetDraftPolicy was listed only in the Bearer-auth scope table (config:read) and was not in SANDBOX_SECRET_METHODS or DUAL_AUTH_METHODS, so OIDC-enabled gateways rejected those calls and the /wait long-poll surfaced gateway_lookup_failed. Local/no-OIDC setups happened to work because the auth check is short-circuited. Add GetDraftPolicy to DUAL_AUTH_METHODS, matching the existing GetSandboxConfig pattern (called by both CLI reviewer surfaces with Bearer and the sandbox supervisor with x-sandbox-secret). Dual-auth short-circuits the scope check for sandbox-secret callers, so the config:read entry in authz.rs continues to gate Bearer-only flows. Mirror the openshell_get_sandbox_config_is_dual_auth assertion for GetDraftPolicy. Note: ssh_handshake_secret is server-wide, not per-sandbox, so a sandbox-secret caller can today name any sandbox in a SubmitPolicyAnalysis request — and now in a GetDraftPolicy request. The exposure is symmetric with the existing SANDBOX_SECRET_METHODS pattern. Filed as a follow-up: per-sandbox secret binding, tracked separately. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(ci): address rebased check failures Signed-off-by: Alexander Watson <zredlined@gmail.com> --------- Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined self-assigned this May 4, 2026

zredlined added state:in-progress Work is currently in progress topic:l7 Application-layer policy and inspection work area:policy Policy engine and policy lifecycle work labels May 4, 2026

johntmyers reviewed May 4, 2026

View reviewed changes

Comment thread crates/openshell-sandbox/src/skills/policy_advisor.md

johntmyers reviewed May 4, 2026

View reviewed changes

Comment thread crates/openshell-sandbox/src/l7/rest.rs Outdated

zredlined marked this pull request as ready for review May 6, 2026 19:37

zredlined requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 6, 2026 19:37

zredlined requested a review from johntmyers May 6, 2026 19:37

zredlined mentioned this pull request May 6, 2026

OpenShell Agent-Driven Policy Management #1062

Open

zredlined force-pushed the feat/agent-driven-policy-management branch from b357c56 to 84f4e29 Compare May 7, 2026 15:25

zredlined force-pushed the feat/agent-driven-policy-management branch from 84f4e29 to f48f4c8 Compare May 8, 2026 03:04

johntmyers added the test:e2e Requires end-to-end coverage label May 8, 2026

johntmyers previously approved these changes May 8, 2026

View reviewed changes

zredlined added 3 commits May 8, 2026 13:45

docs(rfc): add agent-driven policy management

05c721e

docs(rfc): switch policy MVP to local API

d869686

docs(rfc): clarify policy advisor skill and local logs

e82b535

zredlined and others added 8 commits May 8, 2026 13:47

test(policy-advisor): require proposal opt-in for e2e

6f79bbb

refactor(sandbox): group policy poll loop state

65ed2b9

johntmyers dismissed their stale review via 65ed2b9 May 8, 2026 20:58

johntmyers force-pushed the feat/agent-driven-policy-management branch from 63b5288 to 65ed2b9 Compare May 8, 2026 20:58

test(e2e): isolate Kubernetes user namespace test

2b3c64e

johntmyers self-requested a review May 9, 2026 00:13

johntmyers approved these changes May 9, 2026

View reviewed changes

johntmyers merged commit 1c79b21 into main May 9, 2026
31 of 33 checks passed

johntmyers deleted the feat/agent-driven-policy-management branch May 9, 2026 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agent-driven policy management MVP#1151

feat: agent-driven policy management MVP#1151
johntmyers merged 23 commits into
mainfrom
feat/agent-driven-policy-management

zredlined commented May 4, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

zredlined commented May 7, 2026

Uh oh!

zredlined commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

johntmyers commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zredlined commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Demo output

Changes

feat(sandbox): wire policy.local denials to OCSF JSONL log

feat(cli): show L7 protocol/method/path in rule get output

refactor(examples): rewrite policy demo as Codex-default loop

Testing

Update (review iteration, May 6)

Checklist

Out of scope (deferred per the build plan)

Side note

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

zredlined commented May 7, 2026

Uh oh!

zredlined commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

johntmyers commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zredlined commented May 4, 2026 •

edited

Loading

`feat(sandbox): wire policy.local denials to OCSF JSONL log`

`feat(cli): show L7 protocol/method/path in rule get output`

`refactor(examples): rewrite policy demo as Codex-default loop`